Search results for "Forbidden word"

showing 8 items of 8 documents

Text Compression Using Antidictionaries

1999

International audience; We give a new text compression scheme based on Forbidden Words ("antidictionary"). We prove that our algorithms attain the entropy for balanced binary sources. They run in linear time. Moreover, one of the main advantages of this approach is that it produces very fast decompressors. A second advantage is a synchronization property that is helpful to search compressed data and allows parallel compression. Our algorithms can also be presented as "compilers" that create compressors dedicated to any previously fixed source. The techniques used in this paper are from Information Theory and Finite Automata.

Theoretical computer scienceFinite-state machineComputer science[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS]010102 general mathematicsforbidden wordData_CODINGANDINFORMATIONTHEORY0102 computer and information sciencesInformation theory01 natural sciencesfinite automatonParallel compressionpattern matching010201 computation theory & mathematicsEntropy (information theory)Pattern matching0101 mathematicsTime complexityAlgorithmdata compressioninformation theoryData compression
researchProduct

Alignment-free sequence comparison using absent words

2018

Sequence comparison is a prerequisite to virtually all comparative genomic analyses. It is often realised by sequence alignment techniques, which are computationally expensive. This has led to increased research into alignment-free techniques, which are based on measures referring to the composition of sequences in terms of their constituent patterns. These measures, such as $q$-gram distance, are usually computed in time linear with respect to the length of the sequences. In this paper, we focus on the complementary idea: how two sequences can be efficiently compared based on information that does not occur in the sequences. A word is an {\em absent word} of some sequence if it does not oc…

0301 basic medicineFOS: Computer and information sciencesFormal Languages and Automata Theory (cs.FL)Computer Science - Formal Languages and Automata TheorySequence alignmentInformation System0102 computer and information sciencesCircular wordAbsent words01 natural sciencesUpper and lower boundsSequence comparisonTheoretical Computer ScienceCombinatorics03 medical and health sciencesComputer Science - Data Structures and AlgorithmsData Structures and Algorithms (cs.DS)Absent wordCircular wordsMathematicsSequenceSettore INF/01 - InformaticaProcess (computing)q-gramComputer Science Applications1707 Computer Vision and Pattern Recognitionq-gramsComposition (combinatorics)Computer Science Applications030104 developmental biologyComputational Theory and MathematicsForbidden words010201 computation theory & mathematicsFocus (optics)Forbidden wordWord (computer architecture)Information SystemsInteger (computer science)
researchProduct

A Characterization of Bispecial Sturmian Words

2012

A finite Sturmian word w over the alphabet {a,b} is left special (resp. right special) if aw and bw (resp. wa and wb) are both Sturmian words. A bispecial Sturmian word is a Sturmian word that is both left and right special. We show as a main result that bispecial Sturmian words are exactly the maximal internal factors of Christoffel words, that are words coding the digital approximations of segments in the Euclidean plane. This result is an extension of the known relation between central words and primitive Christoffel words. Our characterization allows us to give an enumerative formula for bispecial Sturmian words. We also investigate the minimal forbidden words for the set of Sturmian wo…

CombinatoricsChristoffel symbolsApproximations of πEuclidean geometrySturmian wordAlphabetMathematicsSturmian words Christoffel words special factors minimal forbidden words enumerative formula
researchProduct

Minimal forbidden words and factor automata

1998

International audience; Let L(M) be the (factorial) language avoiding a given antifactorial language M. We design an automaton accepting L(M) and built from the language M. The construction is eff ective if M is finite. If M is the set of minimal forbidden words of a single word v, the automaton turns out to be the factor automaton of v (the minimal automaton accepting the set of factors of v). We also give an algorithm that builds the trie of M from the factor automaton of a single word. It yields a non-trivial upper bound on the number of minimal forbidden words of a word.

TheoryofComputation_COMPUTATIONBYABSTRACTDEVICESfailure functionfactor code[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS]Büchi automatonComputerApplications_COMPUTERSINOTHERSYSTEMS[INFO.INFO-DS] Computer Science [cs]/Data Structures and Algorithms [cs.DS]0102 computer and information sciencesavoiding a wordω-automaton01 natural sciencesfactorial languageReversible cellular automatonCombinatoricsDeterministic automatonanti-factorial languageNondeterministic finite automaton0101 mathematicsMathematicsfactor automatonPowerset constructionLevenshtein automaton010102 general mathematicsforbidden wordComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)16. Peace & justiceNonlinear Sciences::Cellular Automata and Lattice GasesTheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES010201 computation theory & mathematicsProbabilistic automatonPhysics::Accelerator PhysicsComputer Science::Programming LanguagesHigh Energy Physics::ExperimentComputer Science::Formal Languages and Automata Theory
researchProduct

Automata and differentiable words

2011

We exhibit the construction of a deterministic automaton that, given k > 0, recognizes the (regular) language of k-differentiable words. Our approach follows a scheme of Crochemore et al. based on minimal forbidden words. We extend this construction to the case of C\infinity-words, i.e., words differentiable arbitrary many times. We thus obtain an infinite automaton for representing the set of C\infinity-words. We derive a classification of C\infinity-words induced by the structure of the automaton. Then, we introduce a new framework for dealing with \infinity-words, based on a three letter alphabet. This allows us to define a compacted version of the automaton, that we use to prove that ev…

Discrete mathematicsKolakoski wordGeneral Computer ScienceC∞-wordsPowerset constructionTimed automatonPushdown automatonBüchi automatonComputer Science - Formal Languages and Automata TheoryComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)68R15AutomataTheoretical Computer ScienceCombinatoricsForbidden wordsDeterministic automatonProbabilistic automatonTwo-way deterministic finite automatonNondeterministic finite automatonC∞ -wordForbidden wordComputer Science::Formal Languages and Automata TheoryComputer Science(all)Computer Science - Discrete MathematicsMathematicsTheoretical Computer Science
researchProduct

On the Structure of Bispecial Sturmian Words

2013

A balanced word is one in which any two factors of the same length contain the same number of each letter of the alphabet up to one. Finite binary balanced words are called Sturmian words. A Sturmian word is bispecial if it can be extended to the left and to the right with both letters remaining a Sturmian word. There is a deep relation between bispecial Sturmian words and Christoffel words, that are the digital approximations of Euclidean segments in the plane. In 1997, J. Berstel and A. de Luca proved that \emph{palindromic} bispecial Sturmian words are precisely the maximal internal factors of \emph{primitive} Christoffel words. We extend this result by showing that bispecial Sturmian wo…

FOS: Computer and information sciencesGeneral Computer ScienceSpecial factorDiscrete Mathematics (cs.DM)Computer Networks and CommunicationsApproximations of πFormal Languages and Automata Theory (cs.FL)Computer Science - Formal Languages and Automata TheoryEnumerative formula68R15Characterization (mathematics)Minimal forbidden wordTheoretical Computer ScienceCombinatoricsComputer Science::Discrete MathematicsEuclidean geometryPhysics::Atomic PhysicsMathematicsChristoffel symbolsApplied MathematicsPalindromeSturmian wordSturmian wordComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)Combinatorics on wordsComputational Theory and MathematicsWord (group theory)Computer Science::Formal Languages and Automata TheoryChristoffel wordComputer Science - Discrete Mathematics
researchProduct

Word assembly through minimal forbidden words

2006

AbstractWe give a linear-time algorithm to reconstruct a finite word w over a finite alphabet A of constant size starting from a finite set of factors of w verifying a suitable hypothesis. We use combinatorics techniques based on the minimal forbidden words, which have been introduced in previous papers. This improves a previous algorithm which worked under the assumption of stronger hypothesis.

General Computer ScienceFragment assemblyFactor automaton[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS][INFO.INFO-DS] Computer Science [cs]/Data Structures and Algorithms [cs.DS]0102 computer and information sciences02 engineering and technology01 natural sciencesMinimal forbidden wordTheoretical Computer ScienceCombinatorics0202 electrical engineering electronic engineering information engineeringFinite setComputingMilieux_MISCELLANEOUSCombinatorics on wordMathematicsShortest superstringCombinatorics on wordsRepetition index16. Peace & justice010201 computation theory & mathematics020201 artificial intelligence & image processingAlphabetConstant (mathematics)Word (computer architecture)Computer Science::Formal Languages and Automata TheoryComputer Science(all)
researchProduct

Linear-time sequence comparison using minimal absent words & applications

2016

Sequence comparison is a prerequisite to virtually all comparative genomic analyses. It is often realized by sequence alignment techniques, which are computationally expensive. This has led to increased research into alignment-free techniques, which are based on measures referring to the composition of sequences in terms of their constituent patterns. These measures, such as q-gram distance, are usually computed in time linear with respect to the length of the sequences. In this article, we focus on the complementary idea: how two sequences can be efficiently compared based on information that does not occur in the sequences. A word is an absent word of some sequence if it does not occur in…

0301 basic medicineLatin AmericansComputer Science (all)Library science0102 computer and information sciencesCircular wordAlgorithms on string01 natural sciencesAlignmentfree comparisonSequence comparisonTheoretical Computer Science03 medical and health sciences030104 developmental biology010201 computation theory & mathematicsInformaticsPolitical scienceAbsent wordForbidden word
researchProduct